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Abstract 

Is it possible to discover a local outbreak of an infectious disease in 
a region for which data are missing, but which is at work as a disease 
spreader? Node discovery for the spread of an infectious disease is defined 
as discriminating between the nodes which are neighboring to a missing 
disease spreader node, and the rest, given a dataset on the number of 
cases. The spread is described by stochastic differential equations. A 
perturbation theory quantifies the impact of the missing spreader on the 
moments of the number of cases. Statistical discriminators examine the 
mid-body or tail-ends of the probability density function, and search for 
the disturbance from the missing spreader. They are tested with compu- 
tationally synthesized datasets, and applied to the SARS outbreak and 
flu pandemic. 

1 Introduction 

No sooner had a new year begun in 2003 than citizens were seized with panic 
in Guangdong in south China. Hundreds were suffered from a pneumonia-like 
strange disease, some of which had been dead. Both Chinese government and 
Chinese media remained silent all the time as to the risk of a possible epidemic. 
No one in the rest of the world knew there was any real cause for alarm. But 
in March, local outbreaks of a mysterious disease were reported in Hong Kong 
and Southeast Asian countries. The World Health Organization (WHO) issued a 
global alert. Even then, health authorities could not reveal where the disease had 
come from. This story at the onset of the Severe Acute Respiratory Syndrome 
(SARS) outbreak poses an interesting question. Is it possible to discover the 
presence of a missing disease spreader from the surveillance records on the cases 
in other regions? This study addresses such a node discovery problem for the 
spread of an infectious disease. 

The spread of an infectious disease is a stochastic phenomenon in a spa- 
tially heterogeneous medium. Many mathematical models of disease transmis- 
sion [Riley 20^ rely on an epidemiological compartment model and a meta- 
population network model in formulating stochasticity and spatial heterogene- 



ity [Dangerfield 2009 , |Siinoes 2008] . These models are described by a set of 



stochastic differential equations [Hufnagel 2004| . The analysis of the spread 
includes many tasks from reproduction to prediction. Reproduction of the ac- 
tual spread [Christensen 2010] . |Isham 2010] is essential in understanding the 
role of a so-called super-spreader [Fujic 2007], [Small 2006] and epidemiologi- 
cal thresholds [Parshani 2010) . [Coli zza 200'3- This is a forward analysis. An 
inverse analysis includes estimating the transmission parameters from obser- 
vation [Walker 2010) . [Keeling 2004| . Network profiling estimates the effec- 
tively decisive topology of a transportation network which governs the spread 
[Maeno 2010| . Network inference in a communication network [Rabbat 2008| is 
a similar problem. Statistical learning and computational Monte-Carlo simula- 
tion contribute to developing a reliable bio-surveillance system in a noisy envi- 
ronment [Reis 200"3] . detecting abnormal events [Takeuchi 2006] as an omen of 
the outbreak [Lu 2010] . and predicting the spread to aid the health authorities 
in containing the outbreak [Colizza 2006[ . None of these, however, addresses the 
problem. Node discovery problem for a social network [Maeno 2009] is discov- 
ering a person who does not appear in the logs on communication, but actually 
influential to others in an organization. The substantial nature of the problem 
is the same as this. But the mathematical solution should be different entirely 
because of the difference in the mechanism of the spread. 

In this study, a perturbation theory quantifies how the perturbation which 
a missing spreader node exerts disturbs the growth of the number of cases. The 
missing spreader may be the place where the first patient appears (an index 
node) or not (an intermediate node). Its presence impedes the reproduction 
of the observed spread by an unperturbed mathematical model. The irrepro- 
ducibility is rather the clue to solve the problem. Two statistical discriminators 
are invented. Their role is discriminating between the nodes which are neighbor- 
ing to the missing spreader node, and the rest (non- neighboring nodes), given 
a dataset. It is a collection of the time sequence data on the number of cases, 
or on the cumulative number of new cases, at the nodes within the scope of 
surveillance in the early growth phase of the spread. The network topology and 
transmission parameters may be given as an input to the discriminators. Or 
they may be unknown and must be estimated. The discriminators examine the 
mid-body or the tail-ends of the probability density function of the number of 
cases, and search for disturbed time sequence data to which the perturbation 
gives rise. The mid-body discriminator is founded on the Kolmogorov-Smirnov 
test. The tail-end discriminator is founded on the Chauvenet rejection test. 
The discriminators are tested with a number of computationally synthesized 
datasets, and applied to the WHO datasets on the SARS outbreak and on the 
flu pandemic (HlNl swine influenza A) in 2009. 

2 Problem 

The mathematical model of the spread in this study is a special case of a 
stochastic reaction-diffusion process, the integration of a standard epidemio- 
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logical SIR compartment model and a meta-population network model. The 
meta-population network model |BaroncIielli 2008] sub-divides the entire pop- 
ulation into distinct sub-populations in many geographical regions. The geo- 
graphical regions are the nodes Ui {i = 0, 1, • • • ). The transportation between 
two regions is a pair of unidirectional links. The adjacency matrix I, whose i-th 
row and j-th column element is kj , determines the network topology. If a link 
from Hi to Uj is present, lij is 1. If absent, lij is 0. The parameter 7 is a matrix 
whose z-th row and j-th column element jij is the probability at which a person 
moves from Ui to nj per a unit time. In many application areas, an empirical 
law = ^ijil) determines 7 as a function of L In the following, 7 and I are 
interchangeable. 

In the SIR compartment model [Keeling 2008 the state of a person changes 
from a susceptible state, through an infectious state, to a removed (recovered) 
state. The quantity Si(t) is the number of susceptible persons at a node rii at 
time t. liit) is the number of infectious persons. Riit) is the number of recovered 
persons. Ji{t) is the cumulative number of new cases until t. The parameter a 
represents the probability at which an infectious person contacts a person and 
infect the person per a unit time. The parameter (3 represents the probability 
at which an infectious person recovers per a unit time. These transmission 
parameters do not depend on time and sub-populations. The reproductive ratio 
r is defined hy r — a/ P [Lipsitch 2003] . Movement, infection and recovery are 
Markovian stochastic processes which are governed hy = {a, /3, 1}. 

Node discovery is a problem to determine whether the nodes no, n-i, • • • , tt-tv-i 
which appear in a given dataset is neighboring to a missing spreader node tin 
or not. The neighboring nodes are those having links to tin- The dataset is 
either Ii{td), or AJi{td) = Ji{td+i) - Ji{td), where « = 0, 1, • • • ,iV - 1 denotes 
the nodes, and d — 0,1, ■ ■ ■ ,D~1 denotes observations. The interval between 
observations is At = td+i — td- The number of data is ND. The network topol- 
ogy and the transmission parameters 6 = {a, P, Uj} for i, j = 0, 1, • • • , N — 1 are 
given under some conditions, and unknown under the other conditions. No other 
information is available. The node n^v is either present or absent. If present, it 
is either an index node or an intermediate node. The parameters and l^j are 
unknown under any conditions. An empirical law = is postulated in 

this study. Without the law, when were unknown, the continuous parameter 
7 would be estimated, rather than the binary parameter I. 



3 Method 

3.1 Probability density function 



The time evolution oiIi{t), or Ji{t) is given by Langevin equations Hufnagel 2004 



They are a system of stochastic differential equations [Kampen 2007| . The 
ensemble of an infinite number of sample trajectories hit) is equivalent to 
the time dependent joint probability density function P{I, t) of the probabil- 
ity variables I = (Jo,/i,---). Stochastic differential equations for Ii{t) are 
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converted to a partial differential equation for P{I,t). This Fokker-Planck 
equation [Kanipen 2007| is converted to ordinary differential equations to cal- 
culate the moments of li one order after another. The first order moments 
mi{t\6) = J IiP{I,t)dI are the mean of li at t. The second order moments 
Vij{t\6) are the covariance about the mean between li and Ij. The third or- 
der moments Sijk{t\9) are the skewness about the mean among li, Ij, and 
Ik- The fourth order moments Kijki(t\9) are the kurtosis about the mean 
among li, Ij, Ik, and The fifth and higher order moments are not ana- 
lyzed. The formulae for the moments are listed in Appendix |^ The observa- 
tions Ii(td) at td are the initial condition to obtain the moments at td+i- In 
the following, the lower order moments refer to the mean and variance, and 
the higher order moments refer to the skewness and kurtosis. For small At, 
Sijkitd+i\9),Kijki{td+i\0) < mi{td+i\9),Vij{td+i\9)- The probability density 
function is a multi-variate normal distribution in eq.([T]). The i-th element of 
the row vector m{td+i\9) is mi{td+i\9). The i-th row and j-th column element 
of the matrix v(td+i\9) is Vij{td+i\9). 

P{I,td+i) « PN{r,m{td+i\9),v{td+i\9)). (1) 

P{I, t) depends on time and many probability variables. Analyzing a dataset 
is an involved task. Time dependent conditional z-score for li resolves the 
involvedness. The variables li at td+i are converted to the variables Zi defined 
by eq.©. 

. i^^-^?itd,m 

In eq.©, the mean mf{td+i\9) and the variance vf^{td+i\9) are conditioned 
on the observation for the rest of the variables Ij — {I^, ■ ■ ■ , h-i, h+i, ■ ■ ■ , In-i) 
at td+i- Generally, given the conditional probability density function for li 
is a uni-variate normal distribution Pj^{Ii;mf ,v^^) if / obeys a multi-variate 
normal distribution Pi;!{I;m,v). The conditional mean mf{td+i\9) is given by 
eq.Q. It is a sum of the unconditional mean rrii and a term dependent on the 
difference between the observation and the expected value (/ — m) for the rest 
of the variables at td+i- The column vector (J — m)'^ is a transpose of a row 
vector I — m. 

m'^{td+i\9) m,{td+i \9) + va{td+i\9)vji[td+i \0)-\l-, ~ mj{td+i \9)f. (3) 

The conditional variance vfi{td+i\9) is a Shur compliment of vji in v. It is 
given by eq.(|4]). Because the observation reduces uncertainty, v^^ is smaller than 
Vii by the amount determined by the second term. 

v?^{td+i\9) = vu{td+i\9) - vdtd+i\9)vji{td+i\9)-^vj,{td+i\9). (4) 

In eq. ([3]) and (|4]) , the unconditional mean is partitioned into the i-th element 
and a row vector. The unconditional covariance is partitioned into four sub- 
matrices (1 X 1, 1 X (iV - 1), (iV - 1) X 1, (A^ - 1) X (iV - 1) matrices). These 
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are given by eq.®. 



m^(rn,,mi),v={^ (5) 



The non-uniform growth at different nodes at different times is absent in Zi. 
Eq.([T]) becomes eq.®, which is vahd for the all nodes Ui at any time t as far as 
the approximation that / obeys a multi-variate normal distribution holds true. 

P(z„t<j+i|7-,6/) =Pn(z»;0,1). (6) 

In the above discussion, it is assumed that 9 is known and Ii{td) is given as 
a dataset. If is unknown, the network topology and transmission parameters 
must be estimated from a given dataset. A well-known statistical technique for 
this purpose is the maximal likelihood estimation or the maximal a posteriori 
probability estimation .Hastie 2001, . The true parameter is substituted by 
the estimator 9. If AJi{td) is given, instead of Ii{td), AJi{td) is converted to 
Ii{td) by eq.([7]). The all formulae for Ii{td) are applicable then. 

Eq.Q is a discrete-time approximation to the stochastic differential equa- 
tion. The fluctuation terms are simply discarded. The motivation to use eq.Q 
is as follows. The model in this study is a hidden Markovian model in the sense 
that the observation AJi{td) is determined by the unobserved states of vari- 
ables Ii{t) in a Markovian stochastic process. It is known generally intractable 
to estimate the model with many continuous-time dependent (so-called hetero- 
skedastic) continuous latent variables. Such an approximation as eq.(I7]) is criti- 
cal in the estimation. If the true value a in eq.([71) is unknown, it is substituted by 
the estimator a. The mathematical details for these estimation and conversion 
are presented in [Maeno 2010] . 



3.2 Perturbation theory 

A perturbation theory quantifies the impact of a missing spreader node on 
its neighboring nodes and non-neighboring nodes. Let us investigate a simple 
network which consists of a missing spreader node rig, a neighboring node rin 
which is connected to rig, and a non- neighboring node ria which stays apart from 
rig. That is, = 2 and n2 — rig. The three nodes are connected with two links 
between Un and ria, and rin and rig- Assume that jna = 7an = 7, 7ns — 7sn = 7', 
and 7as = 7sa = 0. The presence of rig means 7' > 0. Given = {a, j3, 7}, how 
does non-zero 7' disturb the moments of /„ and /a? 

The formulae for the moments in this network are listed in Appendix [BJ The 
mean for ?i„ changes in the direction determined by the sign of In{td) — Is{td)- 
The variance increases in proportion to 7'. The terms dependent on 0(7'^) 
appear in the formulae for the skewness and kurtosis. The mean and variance 
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of the z-score for rin are given by eq.® and The quantity (z) is the average 
over the observations at different times. In eq.([2]), the disturbance is included 
in the observation li whose moments are mi{td+i\9,^') and Vii(td+i\0-,j')- On 
the other hand, nothing can be assumed about the presence of Hs in calculating 
mf and wj^. These are the values predicted from mi{td+i\0, 0) and Vii{td+i\9, 0) 
when 7' — 0, given 6 = {a, /3, 7}. 



As 7' increases, the difference between the probability density function of Zn 
and the standardized normal distribution becomes more significant. In contrast, 
the mean, variance, and skewness for do not change at all. Interestingly, the 
kurtosis increases in proportion to 7'. The presence of Ug disturbs the fourth and 
higher order moments for ria- In terms of the z-score for ria, (za) = and ((za — 
(•Za))^) = 1- The coupling with a spreader and non- neighboring nodes emerges 
at this order. But such a signal from non-zero 7' may be too weak to detect 
from ria. The above discussion implies that the normality of the probability 
density function for neighboring nodes is vulnerable to the perturbation which 
a missing spreader node exerts, but that the impact on the normality for non- 
neighboring nodes is not salient. This is the basis to discriminate between the 
neighboring nodes and non-neighboring nodes statistically. 

Let us extend the theory to multiple non-neighboring nodes. The entire set 
of nodes is treated as a big node Ua- The number of infectious persons at the 
individual nodes is roughly the same as /„. Assume there are k such nodes. 
/a ~ kin holds. If Us is an index node, Ig 3> In holds. The variance of Zn in 
eg. (0i is sufficiently large as a signal when k < j'is/jin- Any k satisfy this 
criterion unless 7' is extremely small. If Us is an intermediate node, Ig ~ In 
holds. The variance is large enough when k < (27' — 7 — a — Small k, 

rather than small N , seems a prerequisite for discrimination. 

When a household or a hospital ward form a sub-population, the population 
of a node is very small. Most of the sub-population are likely to get infected 
immediately at the onset of infection. The day-by-day fluctuating growth of the 
number of patients, in which the trace of perturbation is left, will not be observed. 
The formulae in this study may not work under this condition. Moreover as an 
extreme, a social network model, where an individual is a node, is suitable to 
analyze the transmission of such diseases as Acquired Immune Deficiency Syn- 
drome (AIDS) Wotterat 1999f . Formulating a stochastic model and perturbation 
theory for the social network model is beyond the scope of this study. 





((z„-(z„)n = l + 



7'(/„ + /s) 



(9) 



(a + /3)/„ -I- 7(/n + /a) ■ 
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3.3 Statistical discriminator 



3.3.1 Mid-body discriminator 

The mid-body discriminator is founded on the Kolmogorov-Smirnov test. The 
Kolmogorov-Smirnov test [Press 2007] estimates the minimal distance between 
two cumulative density functions. In many applications, the test is used for a 
one sample hypothesis testing where the null hypothesis is that the cumulative 
density function drawn from a dataset empirically is the same as a given refer- 
ence cumulative density function. A typical reference function is a cumulative 
normal distribution with a given mean and variance. The test is more sensitive 
to the difference in the mid-bodies of the cumulative density functions than that 
in the tail-ends. In other words, it is more sensitive to the difference in the lower 
order moments. Let z(d) denote multiple observations (c? = 0, 1, • • • ,D — 1) for 
a single probability variable z. The empirical cumulative density function F(z) 
is given by eq. pH]) . The function H{x) for a real argument a; is a Heaviside's 
function whose value is 1 when x > 0, and when a; < 0. 

Fiz) = ^Y.^('^-<d)). (10) 

d=Q 

The test statistic T is given by eg. pTj) . Fsi{z) is the reference cumulative 
density function which z obeys in the hypothesis. This is the minimal distance 
between the cumulative density functions. The supremum sup x for a real vari- 
able X is the least upper bound of x. 

r= /D~Tsup|F(z)-Fr(z)|. (11) 

z 

The null hypothesis is rejected at the significance level of a when T > Ka 
where Ka — K~^{\ — a). K(x) is a cumulative density function of the Kol- 
mogorov distribution for a probability variable a: > 0. Ka — 1-38 for a = 5%, 
and Ka = 1.63 for a — 1%. Eq. ([TO|) is applicable to calculating the empirical 
cumulative density function Fi(z) of the 2;-score for rii in eq.Q from the obser- 
vation oi li a,t t = td {d = 0,1, D — 1). Because of the property in eq.®, 
the reference cumulative density function is the cumulative standard normal 
distribution Fr(z) = (1 + erf(z/\/2))/2. The test statistic T in eq.lHIl) for 
becomes Ti in eq. (fT2l) . 

^ V^D^sup 1-^ X] - ^^(Wi)) - i±^^:|^|. (12) 

It is not obvious which value of the threshold Ka is the most suitable be- 
cause the normality for non-neighboring nodes and non-normality for neighbor- 
ing nodes are just an approximation. On the other hand, it is difficult to derive 
an analytical formula for an appropriate discrimination threshold as a function 
of the parameters and experimental conditions. Searching for the threshold 
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T* experimentally is rather practical. If Ti > T*, the mid-body discrimina- 
tor determines that the perturbation from an unknown origin disturbs li, and 
consequently that Ui is neighboring to a missing spreader node. 

3.3.2 Tail-end discriminator 

The tail-end discriminator is founded on the Chauvenet rejection test. The 
Chauvenet rejection test [Taylor 1996| detects an outlier in a given dataset. 
It was invented as a criterion to assess statistically whether particular one- 
dimensional numerical data are likely to be spurious or not, and is used widely 
in experimental physics and chemistry today. First, the mean and variance of a 
given dataset are calculated. The probability at which the individual datum is 
obtained under the calculated mean and variance is evaluated. Then, the datum 
is considered to be an outlier if the product of the probability and the number 
of data in the dataset is less than a given threshold. The threshold is 0.5 in 
the conventional Chauvenet rejection test. For example, the data are spurious 
if the probability is less than 0.05 when the number of data is 10. 

The test statistic L is the likelihood function [Hastie 2001] , which is the con- 
ditional probability of the data as a function of the parameters. The conditional 
probability will be noticeably small if the data are spurious, that is, the data 
lies in the tail-ends. The test is sensitive to the anomaly in the higher order 
moments. The test statistic Li of the z-score for rii is defined by ea. ((T5)) . 

U = -^—j ^nP{z,,td+i\Ij,e) = -^—j J2 ln^N(^^;0,l). (13) 

According to the conventional Chauvenet rejection test, the discrimination 
threshold for the test statistic is C = In 0.05 = —3 when N = 10. Recall 
the discussion on the significance of the absolute value of Ka for the mid-body 
discriminator. Instead of relying on C in the conventional Chauvenet rejection 
test, searching for an appropriate discrimination threshold L* experimentally 
is rather appropriate. If Li < L*, the tail-end discriminator determines that a 
node rii is a neighboring node. 

4 Experiment 

4.1 Computationally synthesized dataset 

The performance of the discriminators is studies with a number of computation- 
ally synthesized test datasets. The test datasets are synthesized by numerical 
integration j Kloeden 1992] of a set of Langevin equations in eq. (|17p for ran- 
dom network topologies and transmission parameters. The network is a Erdos- 
Renyi model. The probability at which a link is present between rii and rij 
{lij = Iji = 1) is a constant {ki)/{N — 1). The nodal degree of a node rii is given 
by ki = J2j hj- The average over the all nodes is (ki). 
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Table 1: Four possible outcomes from a discriminator. 





Actual value 


Neighboring 


Non- neighboring 


Discriminator 
output 


Neighboring 


True positive A'tp 


False negative Npp 


Non- neighboring 


False positive A^fn 


True negative Atn 



It is postulated that the total number of persons who moves from m to rij per 
a unit time is proportional to y/kikj if a link is present. This is valid generally for 
the world-wide airline transportation network [Barrat 2004] . Eq. ((T4)) determines 
jij as a function of I. The fraction of persons who outgoes per a unit time is a 
constant 7 over the network. 



7. = r.,(0 = ^^^^^7. (14) 

It is also postulated that the initial population Pi{0) = Si{0) + li{0) + Ri{0) 
of a node is proportional to the total number of persons who outgoes from 
the node per a unit time |Maeno 2010| . -Pi(O) is given by eq. p^ . The total 
population is P = lO^iV in the experiment. This is relevant in synthesizing the 
test datasets, but not necessary in discrimination. 



pm = (15) 

Z^i.j Hjy '^i'^j 

A receiver operating characteristic curve jFawcett 2006] is drawn to evaluate 
the accuracy of discrimination. In signal processing, a receiver operating char- 
acteristic curve is a plot of the true positive ratio i?TP on the vertical axis and 
the false positive ratio i?FP on the horizontal axis for a binary discriminator as 
its discrimination threshold is varied. It is generally a concave function. The 
discrimination threshold is T* for the mid-body discriminator, and L* for the 
tail-end discriminator. There are four possible outcomes from the discrimina- 
tor. They are summarized in Table [TJ True positive and true negative are right 
answers, but false positive and false negative are wrong answers. The number of 
the true positive is A^j-p- The number of nodes is A = Axp + A^fn + A^fp + A^tn ■ 
The ratios are defined by ea. p^ . i?TP is called recall alternatively, and i?FP is 
called fallout. 

i?TP — 1 Rw — . (16) 

A^TP+A^FN A^FP + A^TN 

The ratios take the value between and 1. If the discriminator works ideally 
excellently, A^tp — kN, A^fn — 0, App = 0, and A^tn = N — kj^. The curve for 
the ideal discriminator degenerates to the upper left corner (-Rfp, ^tp) = (0, 1). 
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The curve for a randora discriminator is a straight hne Rtp = Rfp between 
(0,0) and (1,1)- The curve for a more excellent discriminator comes closer to 
the upper left corner. The closeness of i?TP — Rfp to its ideal value of 1 is a 
scalar indicator of accuracy. It is used as an objective function to search for the 
optimal thresholds L* and T* experimentally. 

Figure [T] a, b shows the receiver operating characteristic curves for = 10 
when the missing spreader node riio is an index node. The tail-end discriminator 
works excellently both when 6 is given and unknown. The best performance 
is Rtp — Rfp ~ 1- The mid-body discriminator is the most suitable when 
9 is given. The best performance is Rtp — Rfp ~ 1 for given 9, and « 0.2 
for unknown 9. Figure [1] c, d shows the curves for = 30 when the missing 
spreader node n^o is an index node. Figure [T]e, f shows the curves for = 10 
when the missing spreader node nio is an intermediate node. Discrimination is 
not as excellent as that in a, b, but excellent moderately. The best performance 
of the tail-end discriminator is Rtp — -Rfp ~ 0.9 for given 6, and « 0.7 for 
unknown 6. The best performance of the mid-body discriminator is Rtp — 
Rpp sa 0.65 for given 6. The intermediate node and its neighboring nodes look 
alike as sinks of infected travelers indistinguishably. This is in strong contrast 
to the index node which is a salient source of them. 

When 9 is unknown, the estimation searches for a network topology of N 
nodes and transmission parameters, whose behavior bears the closest resem- 
blance to /o, /i, • • • , In-1 in the actual network topology of -I- 1 nodes. In 
the simple network in 13.21 this results in mi(td+i\9,0) ^ mi(td+i\9,"f') and 
Vij{td+i\9,0) ^ Vij{td+i\9,-/'), consequently (2„) ~ and ((zn - (^n))^) ^ 1, 
rather than eq.® and This is confirmed by measuring the moments with 
the datasets. The measured moments are shown in Appendix [C] When a miss- 
ing spreader node is absent, the moments for given 9 are almost the same as 
those for the estimator 9. The estimator reproduces the spread accurately. The 
moments are (m^) ss and (vu) = 1. On the other hand, the difference in every 
moment between neighboring nodes and non-neighboring nodes for the estima- 
tor 9 is much smaller than that for given 9 when the missing spreader node is 
an index node. The difference in (m^), and that in (vu) are so small that Un 
can not be distinguished from Ua by examining the lower order moments. This 
is the reason why the mid-body discriminator is not suitable if 9 is unknown. 
Furthermore, the difference in every moment between neighboring nodes and 
non-neighboring nodes is particularly small when the missing spreader node is 
an intermediate node. Discovering a missing intermediate node is a hard prob- 
lem. 

Figure[2]a, b shows the curves when the number of data is D — 33. Accuracy 
is the same as that in Figure [T] a, b. It does not depend on these experimental 
conditions. Figure [5] c, d shows the curves when the reproductive ratio is 
r = 8, which is four times larger than the value in Figure [1] a, b. The best 
performance of the tail-end discriminator is Rtp — Rfp ~ 0.95 for given 9, and 
« 0.85 for unknown 9. The best performance of the mid-body discriminator 
is Rtp — Rfp ~ 0.95 for given 9. There is a small degradation in accuracy. 
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but discrimination is still excellent. Figure [U e, f shows the curves when the 
fraction is 7 = 0.4, which is four times larger than the value in Figure [T] a, b. 
The accuracy is the same as that in Figured] a, b. 

Figure [3] a, b shows the curves when AJi{td), instead of Ii{td), is given as 
a dataset. The missing spreader node nio is an index node. The experimental 
conditions are the same as those in Figure [T] a, b. These two spreads are 
identical. Discrimination from AJi{td) is not so excellent as than from Ii{td) 
in Figure [T] a, b. The best performance of the tail-end discriminator is Rtp — 
i?FP ~ 0.55 for given 9, and w 0.85 for unknown 0. The best performance of 
the mid-body discriminator is Rtp — Rfp ~ 0.65 for given 6. Figure [3] c, d 
shows the curves under the condition where the missing spreader node nio is 
an intermediate node. The best performance of the tail-end discriminator is 
i?TP — -Rfp ~ 0.4 both for given and unknown 9. The best performance of the 
mid-body discriminator is Rtp — Rfp ~ 0.6 for given 9. The performance of 
the mid-body discriminator does not change much between a, b and c, d. 

Somehow these tendencies are contrary to the results so far. Recall that 
the fluctuation term in eq. is discarded in the approximation of eq. (O . The 
term is more complex than a mere Gaussian white noise because its amplitude 
includes a/ It (t) , and the ensemble of sample trajectories Ii{t) does not obey a 
multi-variate normal distribution in eq.(IT]) for large t. Eq.Q is correct on the 
average, but the conversion of the higher order moments from AJi{td) to Ii{td) 
is inaccurately distortive. It is the reason why the tail-end discriminator is only 
moderately excellent, and the shape of the curve looks awkward when 9 is given. 
On the other hand, when 9 is unknown, the estimator 9 may be able to make 
up for the distortion in the higher order moments. The tail-end discriminator 
with the estimator 9 outperforms the discriminators for given 9. Even if the 
true parameters are given, adjusting the parameters is rather indispensable for 
excellent discrimination from AJi{td). Substantial improvement of eq.([7]) is 
beyond the scope of this study and for future challenge. 

Figure m a, b shows Rtp — Rfp as a function of the discrimination threshold 
L* or T* when 9 is given. The rising edges of the three curves coincide with 
each other. L* = —3 to —3.5, and T* — 1.5 to 2 are reasonable if there is no 
prior information on the presence or absence of a missing spreader node. If the 
assumption is grounded well that an index node is missing, there is a distinct 
optimal threshold L* = —4, and T* = 2.1. Figure |4] c, d shows Rtp — Rfp 
when 9 is unknown. Setting L* = —3 to —3.5 is reasonable. The position of 
the falling edges of the curves in c, d are different from that in a, b. There is 
a distinct optimal threshold L* = —3.4 for a missing index node. 

Figure [5] a, b shows the optimal thresholds L* and T* as a function of the 
transmission parameters r (a and /3) and 7. Figure [5] c, d shows the optimal 
thresholds L* and T* as a function of the number of nodes N. The optimal 
thresholds depend on r. This means that it is practical to find the optimal 
threshold experimentally, which is suitable for individual conditions. The de- 
pendence on 7 and N is so small that the dependence can be ignored. 
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Figure 1: Receiver operating characteristic curves of the tail-end discriminator 
(a, c, e), and the mid-body discriminator (b, d, f). The parameters are (fc^) = 2, 
r = 2 {a = 0.067, /3 = 0.033), and 7 = 0.1. J^td) for < d < 99 {D ^ 100) 
with = 1 is given as a dataset. a, b, TV = 10, the missing spreader node nio 
is an index node, and /io(0) = 200. c, d, = 30, n^o is an index node, and 
-^30(0) = 200. e, f, = 10, nio is not an index node but an intermediate node, 
and /o(0) = 200. The curves are drawn from the trials for 100 different random 
network topologies. 
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Figure 2: Receiver operating characteristic curves of the tail-end discriminator 
(a, c, e), and the mid-body discriminator (b, d, f). a, b, Ii{td) for < (i < 
32 (D = 33) is given as a dataset. c, d, r = 8 (a = 0.089, ^ = 0.011). e, f, 
7 = 0.4. The other experimental conditions are the same as those in Figure [1] 
a, b. 
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Figure 3: Receiver operating characteristic curves of the tail-end discriminator 
(a, c), and the mid-body discriminator (b, d). AJ,(<d) for < d < 99 (D = 100) 
with At = 1 is given as a dataset. a, b, the missing spreader node nio is an 
index node, and /io(0) = 200. c, d, nio is not an index node but an intermediate 
node, and /o(0) — 200. The experimental conditions are the same as those in 
Figure [T] a, b, e, f. 
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Figure 4: i?TP ~ ^^fp of the tail-end discriminator (a, c), and the mid-body 
discriminator (b, d) as a function of the discrimination thresholds, a, b, 9 
is given, c, d, is unknown and estimated from the dataset. The missing 
spreader node rtio is either an index node, an intermediate node, or absent. 
The experimental conditions are the same as those in Figure [1] a, b, e, f. 



4.2 WHO dataset 

4.2.1 SARS outbreak in 2003 

SARS is a respiratory disease in humans caused by the SARS corona-virus. 
The epidemic of SARS appears to have started in Guangdong of south China 
in November 2002. SARS spread from the Guangdong to Hong Kong in early 
2003 [Lipsitch 2003| , and eventually nearly 40 countries around the world by 



July [Riley 20^ . WHO archives the cumulative number of reported probable 
cases of SAR^]. The dataset in the archive had been updated every day. It is a 
collection of time sequence data Ji{td) with At = 1 day. 

The target geographical regions in this study are those where five or more 
cases had been reported in a month since March 17. The number of data is D = 
31. They are Canada (CAN), France (FRA), United Kingdom (GBR), Germany 
(DEU), Hong Kong (HKG), Malaysia (MAS), Taiwan (ROC), Singapore (SIN), 
Thailand (THA), United States (USA), and Vietnam (VIE). Mainland China is 
not included because no data are available in some periods, and no data outside 
of Guangdong is reported in other periods. The total cumulative number of 
cases increased 10 times from J(io) = Tli Jii^o) — 165 to J{tzo) = 1; 846 in these 
= 11 regions. The fluctuation in the dataset originates in the observational 
noise partly, which arises from inaccurate diagnosis and the irregular delays in 



^World Health Organization, Cumulative number of reported probable cases of SARS, 
[http://www.who.int/csr/sars/co uritry/e n/index.htnil (2003). 
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Figure 5: Optimal discrimination thresholds of the tail-end discriminator for 
unknown 9 (a, c), and the mid-body discriminator for given 9 (b, d). a, b, as 
a function of r and 7. c, d, as a function of N. The experimental conditions 
are the same as those in Figure [T] a, b. 
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Table 2: Test statistic L for the mid-body discriminator, T for the tail-end 
discriminator, and the mean to^, variance vu, skewness sm, and kurtosis kuu 
of Zi for the 11 regions where a local outbreak is reported in the early growth 
phase of the SARS outbreak. 



Region 


L 


T 


mi 


Vli 






HKG 


-41.3 


2.1 


0.67 


8.7 


0.39 


5.1 


USA 


-16.1 


2.7 


-2.0 


4.8 


0.11 


-0.15 


CAN 


-15.2 


3.1 


-1.7 


4.8 


0.45 


0.78 


SIN 


-8.5 


1.6 


-0.77 


3.4 


-0.41 


0.24 


ROC 


-8.1 


2.7 


-1.7 


3.2 


0.36 


-0.16 


MAS 


-4.5 


3.1 


-1.5 


2.1 


1.2 


3.8 


VIE 


-3.6 


1.5 


-0.89 


2.4 


-2.0 


6.4 


GER 


-2.4 


2.1 


-0.99 


1.4 


0.29 


-0.81 


ERA 


-2.4 


2.0 


-0.85 


1.5 


0.70 


0.47 


THI 


-2.3 


1.8 


-0.76 


1.7 


0.32 


0.12 


GBR 


-1.4 


1.4 


-0.42 


1.5 


0.78 


1.6 


Average 


-9.6 


2.2 


-1.0 


3.2 


0.20 


1.6 



reporting. The dataset is smoothed with a moving average filter [Walker 2010] . 
The window size here is W — 3 O.ID). It is postulated that the empirical 
law in eq. (|14|) holds true. 

Table[2]shows the calculated test statistic Li for the mid-body discriminator, 
Ti for the tail-end discriminator, and the mean m^, variance va, skewness sm, 
and kurtosis Kan of Zi for the 11 regions. The lower order moments averaged over 
the all regions are (m^) = —1.0 and (va) — 3.2. The entire dataset is disturbed. 
The lower order moments tend to be anomalous for United States, Canada, Sin- 
gapore, and Taiwan, while the higher order moments are anomalously large for 
Malaysia and Vietnam. The estimated values of the parameters are a — 0.18, 
/3 = 0.13 (r ^ 1.4), and 7 = 0.13. The optimal threshold L* = -3.6 is ob- 
tained in the experiment for this condition. Perturbation is discovered in the 
time sequence data for Hong Kong, United States, Canada, Singapore, Tai- 
wan, Malaysia, and Vietnam. The data for Hong Kong is disturbed extremely 
strongly. The variance and kurtosis are anomalous particularly. 

Recall that Mainland China is not included in the dataset. The actual 
cumulative number of cases in Guangdong alone exceeded that in Hong Kong 
in the period. Guangdong was a spreader which had not been known to the 
rest of the world in the early growth phase. This is illustrated by an example of 
the uncovered chains of transmission. Two of the index patients in Toronto in 
Canada and another three of the index patients in United States stayed a hotel 
in Hong Kong, where a Chinese nephrologist, who had treated many patients in 
Guangzhou in Guangdong and become infected, was staying in late Eebruarjf^. 

^SARS Expert Committee (Hong Kong), SARS in Hong Kong: from experience to action, 
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This implies that China was influential potentially in infecting Hong Kong, 
United States, and Canada. The large amount of passenger traffic between 
south China and Southeast Asian countries is supposed to affect the spread 
in these countries. On the other hand, the disturbance is not evident in the 
data for European countries. It is highly probable that the discovered signal of 
perturbation originates in the transmission between China, a missing spreader 
node, and north Pacific Rim countries. 

4.2.2 Flu pandemic in 2009 

The flu pandemic in 2009 was a global outbreak of a new strain of the HlNl 
swine influenza A virus. The virus appeared in Veracruz in southeast Mexico, 
in April 2009. The pandemic spread to United States and Canada immediately, 
and then to the South American countries. West European countries, and Pacific 
Rim countries. It began to decline in November. WHO archives the cumulative 
number of the reported laboratory-confirmed cases of the fiu pandemicj^. The 
dataset in the archive had been updated every day. It is a collection of time 
sequence data Ji{td) with At — 1 day. 

The target geographical regions in this study are those where five or more 
cases had been reported in about three weeks since April 28. The number of 
data is D = 25. They are Austraha (AUS), Belgium (BEL), Brazil (BRA), 
Canada (CAN), Chile (CHL), China (CHN), Colombia (COL), Costa Rica 
(CRI), Ecuador (ECU), El Salvador (SLV), France (ERA), Germany (DEU), 
Israel (ISR), Italy (ITA), Japan (JPN), Mexico (MEX), New Zealand (NZL), 
Panama (PAN), Peru (PER), Spain (ESP), United Kingdom (GBR), and United 
States (USA). The cumulative number of cases increased 100 times from J (to) = 
105 to J{t2A) = 11, 129 in these N — 22 regions. The dataset is smoothed with 
a moving average filter whose window size is = 3. It is postulated that the 
law in ea. (|14p holds true for the pandemic. 

Table [3] shows the calculated test statistics, the mean, variance, skewness, 
and kurtosis of Zi for the 22 regions. The lower order moments averaged over the 
all regions are (m^) = —0.11 and {v^) = 1.3. The entire dataset is not disturbed. 
The lower order moments tend to be anomalous for United States and Mexico, 
while the higher order moments are anomalously large for Canada. Irregularly, 
Japan has large absolute values of both lower and higher order moments. The 
higher order moments for South American countries tend to be large while those 
for West European countries are small. This may be related to the seasonality 
for the tropics and north hemisphere. The estimated values of the parameters 
are a = 0.88, /3 = 0.77 (r = 1.2), and 7 = 0.29. The optimal threshold 
L* = —4.2 is obtained in the experiment for this condition. Perturbation is 
discovered in the time sequence data for United States, Mexico, Canada, and 
Japan. The ratio of this number is = 0.18, which is remarkably smaller 
than 7/7V = 0.64 for the SARS outbreak. In addition, the absolute value of L 

|http: / / www.sar s-expertcom.gov.hk/english/reports/reports / reports_fullrpt.html (2003). 

''World Health Organization, Situation updates - Pandemic (HlNl) 2009, 
|http://www.who.int/csr/disease/swineflu/updates/en/index.html| (2010). 
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Table 3: Test statistics L, T, and the mean uii, variance vu, skewness sm, and 
kurtosis kuu of Zi for the 22 regions where a focal outbreak is reported in the 
early growth phase of the flu pandemic. 







T 


HI „■ 








USA 


-17.2 


2.4 


0.07 


5.2 


-0.09 


-0.55 


MEX 


-16.9 


1.8 


-1.7 


4.8 


-0.31 


-0.10 


CAN 


-8.4 


0.84 


0.02 


3.4 


2.6 


9.6 


JPN 


-7.3 


3.3 


-1.5 


2.8 


2.3 


5.3 


GBR 


-2.4 


1.1 


-0.43 


0.74 


0.48 


-0.56 


ESP 


-2.2 


1.0 


-0.05 


0.80 


1.4 


2.4 


PAN 


-1.9 


1.1 


-0.12 


0.92 


1.5 


2.2 


CRI 


-1.6 


1.5 


-0.15 


1.1 


3.0 


9.6 


ERA 


-1.2 


0.93 


-0.02 


0.67 


1.6 


3.6 


DEU 


-1.0 


1.3 


0.21 


0.75 


2.0 


3.4 


COL 


-0.77 


1.3 


-0.02 


0.43 


1.0 


0.72 


ITA 


-0.71 


1.5 


0.17 


0.55 


1.0 


0.54 


CHN 


-0.68 


1.6 


-0.02 


0.25 


0.88 


0.55 


CHL 


-0.65 


1.7 


0.30 


0.84 


2.6 


7.1 


SLV 


-0.61 


1.2 


-0.01 


0.54 


1.3 


1.1 


NZL 


-0.58 


1.4 


0.02 


0.48 


2.2 


5.4 


BRA 


-0.52 


1.4 


-0.03 


0.60 


3.2 


11.6 


ECU 


-0.51 


1.6 


0.16 


0.89 


3.8 


14.4 


PER 


-0.35 


1.9 


0.15 


0.51 


2.3 


6.1 


AUS 


-0.27 


1.7 


0.27 


0.64 


1.9 


2.4 


BEL 


-0.24 


1.8 


0.14 


0.47 


2.1 


3.6 


ISR 


-0.24 


1.8 


0.08 


0.34 


0.81 


0.04 


Average 


-3.0 


1.6 


-0.11 


1.3 


1.7 


4.0 



is much less anomalous than that for the SARS outbreak. In most regions, L is 
close to while L is less than -2 for the SARS outbreak. The disturbance from 
an unknown origin of perturbation is small and localized. Then, what happened 
to the spread in United States, Mexico, Canada, and Japan? 

The national transportation statistics of United Stated reports that most 
passengers cross the borders between United States and Mexico, and United 
States and Canada by land. The annual number of passengers from Mexico to 
United States by land is 24 times larger than that by air in 2009. The number 
from Canada to United States by land is 7 times larger than that by air in 2005. 
The meta-population network model relies on eq.([T4]) which is known valid for 
the world-wide air transportation. The actual heavy land transportation lets 
the probability of movement across the border deviate from the value estimated 

^Research and Innovative Technology Administration, Bureau of Transportation Statistics, 
USA, I http://www.bts.gov/publi cations/ national_transportation_statisticsl (2010). 
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in the model. This is supported by the result that the fraction 7 is twice as 
large as that for the SARS outbreak. The deviation imposes an impact on the 
speed of the spread at such cities near the border as San Ysidro CA, then inland 
cities, and finally such cities near another border as Buffalo-Niagara Falls NY 
[Balcan 2009] . The discovered signal of perturbation for United States, Mexico, 
and Canada could originate in a localized exception to the empirical law for the 
probability of movement, rather than a missing spreader node. 

The onset of infection in Japan started suddenly on May 9 after many trav- 
elers had returned from North America during the Japan's two week long fes- 
tive break. Flu spread explosively among high school students in Osaka and 
Hyogo. The effective reproductive ratio reached the peak value around May 14 
[Nishiura 2009] . The ratio is significantly higher than 1.4 to 1.6 from an epidemi- 
ological analysis, or 1.2 from a genetic analysis, in Mexico and other countries 
|Fraser 2009] . School closure started on May 17 just after the secondary trans- 
mission had been confirmed and announced ofRcially. The effective reproductive 
ratio declined below 1. The reproductive ratio in Japan rose and fell sharply 
in 10 days in early May. The precondition of the SIR compartment model is 
that the values of the transmission parameters do not depend on time and sub- 
populations. The discovered signal of perturbation for Japan could originate in 
an accidental localized anomaly on the probability of infection, rather than a 
missing spreader node. 

The above results demonstrate the potential capability of the discriminator 
in discovering unknown origins of perturbation which violate the preconditions 
of the mathematical models. On the other hand, the signal discovered by the dis- 
criminators satisfies the sufhcient condition to ascertain the presence of a missing 
spreader node incompletely, although it satisfies the necessary conditions. Prior 
knowledge on the demographic and socioeconomic nature of individual regions 
makes up for the incompleteness. This is similar to the false positives where the 
signal identified an early warning for a catastrophe originates in wide classes of 
impending transitions because the underlying mechanism is understood incom- 
pletely IScheffer 2009^ . It is also of interest to see if the discovered location of a 
missing node satisfies the criterion for a strongly influential spreader in the core 
of the network JKitsak 201 Of . Is it possible to distinguish the presence of a miss- 
ing spreader node from an anomaly on the parameters mathematically? The 
solution will be worked out by examining how possible origins of perturbation 
appear in respective order moments of the disturbed time sequence data, and 
by finding a suitable backward mapping from the pattern of disturbance to the 
possible origins. The criterion for discrimination may be a complex function of 
the preconditions of a mathematical model, experimental conditions, and given 
datasets, rather than a simple scalar threshold of a test statistic in this study. 
These issues are for future challenges. 



20 



5 Conclusion 



This study poses an intriguing problem which few studies have addressed before. 
The problem is a node discovery problem for the spread of an infectious disease. 
Two statistical discriminators are invented to solve the problem. 

The performance of the discriminators is studied with the computationally 
synthesized datasets. The findings are as follows. The tail-end discriminator 
is excellent both when the parameters is given and unknown. The mid-body 
discriminator is the most suitable when the parameters are given. Discovering 
a missing intermediate node is more difficult than a missing index node, but 
possible. The performance depends neither on the speed of the spread, on 
the size of the network, nor on the number of data much. If the cumulative 
number of new cases is given as a dataset, discrimination is moderately excellent, 
and interestingly, the tail-end discriminator works more excellently when the 
parameters are unknown than when they are given. The optimal thresholds 
for discrimination depend solely on the reproductive ratio (the probability of 
infection and recovery). 

The WHO dataset on the SARS outbreak and flu pandemic are analyzed 
with the discriminators. The findings are as follows. The entire dataset on 
the SARS outbreak is disturbed. The signal of perturbation from an unknown 
origin is discovered in the time sequence data for Hong Kong, United States, 
Canada, Singapore, Taiwan, Malaysia, and Vietnam. The data for Hong Kong 
is disturbed extremely strongly. It is highly probable that the discovered signal 
originates in the transmission between China, a missing spreader node, and 
north Pacific Rim countries. In the flu pandemic, the signal of perturbation is 
discovered in the time sequence data for United States, Mexico, Canada, and 
Japan. The ratio of the number of these regions is much smaller than that for 
the SARS outbreak. The test statistics are much less anomalous than those for 
the SARS outbreak. The disturbance in the dataset is small and localized. The 
signal for United States, Mexico, and Canada could originate in the heavy land 
transportation which is an exception to the empirical law for the probability of 
movement. The signal for Japan could originate in an accidentally sharp rise 
and fall of the probability of infection. 

Two advanced problems still remain unsolved. One is raising the accuracy 
of discrimination from the cumulative number of new cases as well as from the 
number of cases. The other is deriving the criteria to distinguish the presence 
of a missing spreader node from an anomaly on the parameters. Solving them 
is a milestone to the theory on the stochastic reaction-diffusion process where 
hidden external or internal entities are at work. These issues are for future 
challenges. 

A Time evolution of moment 

The time evolution of (t) is given by the Langevin equations in eq. (|17l) . The 
fluctuation terms £.^''\t), S.^'^^t), and £.^^\t) are Gaussian white noises. Their 
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explicit functional forms are unknown. The equations are applicable to any 
nodes including missing nodes. 

j 3 



In most cases, the outbreak is contained before the spread reaches equilib- 
rium. In the early growth phase of the outbreak, li <C Si and Ri ^ 5^ hold 
true. Eq.([I71) becomes ea,.^ jMaeno 2010] . 

^ = a/,W~/3/.W+E7.-^^.(i)-E^^^-^^W 



E V7,./,(oeS'(t) - E ^i^^hmwit)- (18) 



J 3 

The time evolution of Ji{t) is given by eq. p^ . 



'^^ah{t) + V^)t\t). (19) 
Eg. pT]) is equivalent to the Fokker-Planck equation in eq. ipO)) . 

'-^ = -Y.§rS^-'MPV.i)^\Y.^,^-L "'Mpc. ')- (20) 

The coefficients a^p and 6ijp in eq. ([20l) are given by eq. (f2T|) and (f22|) . 

flip = (a - ;5 - E'7»i')'^«P (^l) 



bijp = {(a + /3 + E li3')^^P + 7pJ<^ii - 7ij<5ip - 7ji'5jp- (22) 

3' 

The moments satisfy eg. (1^51) through The explicit functional forms of 

lower order moments are necessary to obtain higher order moments. 

_^=^a,pmp(t|0). (23) 
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p p 

dsijk{t\e) ^ Y{aipSpjkit\6) + ajpSpikit\0) + akpSpij{t\0)) 
p 

+ ^i'^ijp'^pk{t\d) + bikpVpj{t\6) + bjkpVpi{t\0)). (25) 
p 

AK.ijki{^\6) ^ '^[aipKpjkiiAO) + ajpK.piki{t\6) + akpKpiji{t\6) + aipKpijk{t\6)) 
p 

+ 'YibijpSpki{t\0) + bikpSpji{t\e) + biipSpjk{t\d) 
p 

+ bjkpSpii{t\6) + bjipSpik{t\0) + bkipSpij(t\e)). (26) 

The solutions of eq. ([23l ) through ([26)) are given by eq.(|27|) through ([30]) when 
At is smaU. 

m,(td+i|6>) = J,(td) + ^a,p/p(td)At + 0(At2). (27) 
p 

v,j [td+i \e) = Kphitd)^ + 0(At2). (28) 

p 

Sijk(td+l\0) = — '^^(bijpbpkq + bikpbpjq + bJkpbp^q)Iq{td)At^ + 0{At^). (29) 
p,g 



Kijkl{td+l\9) — g y^^jbjjpjbpkqbqlr + b 

plq^qkr ^klq^qpr) H~ ^ikpippjq^qlr ~(~ ^plq^qjr H~ ^jlq^qpr) 

p,q,r 

^ilpippjq^qkr ^pkq^qjr ^jkq^qpr^ H" ^jkpQ^piq^qlr ~^ ^plq^qir H~ ^ilq^qpr^ 
~l~ ^jlpippiq^qkr ^pkq^qir ~^ ^ikq^qpr^ H" ^klpippiq^qjr H" ^pjq^qir H~ b^jq^qpr^^-^ri^^d^^^ 

+ 0{M^). (30) 

B Disturbed moment 

The diagonal elements of the moments are derived for the simple network in l3.2l 
which consists of n^, rta, and n^. Eq. (|?7|) through ([50]) become eg. ([51]) through 
(01 for n„. 

m„(<rf+i|6>,7') « J„(i<i) + {(a-/?)/„(i<i)-7(^(^<i)-/a(td))}At 

- i{In{td)-h{td))At. (31) 
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Vnn{td+l\e,-f') « {{a + P)In{td)+7{Initd)+Utd))}At 

+ y{In{td)+Is{td))At. (32) 



Snnn{td+l\9,l') « -{{a + f3f 1^ + j{a + I3){21n{td) + Utd))}Af 

+ /3)(2/„(id) + Isitd)) + 7(2/n(td) + h{td) + Isitd))} + 0(7")]Ai 



'2 

(33) 

Knnnn{td+l\e, 7') ~ 3{(a + /3)3/n(id) + l{a + (3/„(i<j) + /a(td)) + 7'(a + /3)(/n(irf) + /a(id))}Ai' 

+ [7'{3(a + I3)\31n{td) + Isitd)) + 67(a + /3)(3/„(td) + /a(id) + Is{td)) 

+ 7'(3/„(id) + /a(td) + 21,{td))} + 0{Y')]At\ (34) 

Eq.^ through (gHl) become eq.dSS]) through ^ for ria- 

ma(td+i|e,7') « 4(id) + {(a - ^)h{td) - l{ISd) - /n(id))}Ai. (35) 



vUtd+i\e,i) « {(a + /3)/a(M +7(4(t<i) +/„(id))}Ai. (36) 
saaa(id+i|e, V) « ^K" + /3)'4(td) + l{a + /3)(2/a(id) + /„(i<i))}Ai^ (37) 

Kaaaa(i<i+i|e, V) « 3{(a + P)'' Utd) + 7(a + /5)'(3/a(td) + /n(td)) + 7'(a + mSd) + /n(id))}Ai3 
+ ii\Utd) + h{td))M\ (38) 



C Measured moment 

The moments of are measured under the same experimental conditions as 
those for Figure [1] a, b, e, f. The parameter 6 is either given, or unknown and 
estimated from the dataset. The foUowing table shows the measured moments 
when a missing spreader node is absent. The values are the average over the all 
nodes and the trials for 100 different random network topologies. 



Parameter 


{rrii) 


{Vii) 






Given 


0.088 


1.0 


0.15 


0.22 


Unknown 


0.043 


1.0 


0.11 


0.062 



The following table shows the measured moments when the missing spreader 
node nio is an index node. 



Parameter 




{Vvi) 




{^iiii ) 


Given 


Neighboring node 


1.2 


1.8 


2.2 


9.8 


Non-neighboring node 


-0.016 


1.2 


0.72 


2.2 


Unknown 


Neighboring node 


0.19 


0.92 


1.3 


4.6 


Non-neighboring node 


0.071 


0.78 


0.39 


0.71 
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The following table shows the measured moments when mo is an intermedi- 
ate node. 



Parameter 


{rrii) 


{Vii) 






Given 


Neighboring node 


0.61 


1.2 


0.062 


0.13 


Non-neighboring node 


-0.16 


1.0 


0.17 


0.30 


Unknown 


Neighboring node 


0.11 


1.1 


0.056 


0.011 


Non-neighboring node 


0.0087 


1.0 


0.16 


0.10 
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